IDa-Det: An Information Discrepancy-Aware Distillation for 1-bit Detectors
173
(a) VOC trainval0712
(b) VOC test2007
(c) COCO trainval35k
(d) COCO minival
FIGURE 6.14
The Mahalanobis distance of the gradient in the intermediate neck feature between Res101-
Res18 (gathering on the left) and Res101-BiRes18 (uniformly dispersed) in various datasets.
proposal saliency maps of Res101 and Res18 (blue) is much smaller than that of Res101
and BiRes18 (orange). That is to say, the smaller the distance, the smaller the discrepancy.
Briefly, conventional KD methods show their effectiveness in distilling real-valued detectors,
but seem to be less effective on distilling 1-bit detectors.
We are motivated by the observation above and present an information discrepancy-
aware distillation for 1-bit detectors (IDa-Det) [260]. This can effectively address the infor-
mation discrepancy problem, leading to an efficient distillation process. As shown in Fig.
6.15, we introduce a discrepancy-aware method to select proposal pairs and facilitate dis-
tilling 1-bit detectors, rather than only using object anchor locations of student models or
ground truth as in existing methods [235, 264, 79]. We further introduce a novel entropy dis-
tillation loss to leverage more comprehensive information than conventional loss functions.
By doing so, we achieve a powerful information discrepancy-aware distillation method for
1-bit detectors (IDa-Det).
Real-valued Teacher
1-bit Student
Object Region
False Positive
Missed Detection
Information
discrepancy
Entropy
distillation loss
Proposal distribution
(Channel-wise Gaussian distribution)
߮ሺڄሻ
߮ሺڄሻ
FIGURE 6.15
Overview of the proposed information discrepancy-aware distillation (IDa-Det) framework.
We first select representative proposal pairs based on the information discrepancy. Then we
propose the entropy distillation loss to eliminate the information discrepancy.